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the specified region, applicability k is changed in a range of fl=245/(y msx-y min) (12) 

'0- to -r. More specificaUy, if the ol^ect is judged to be ^ ^ ^,3^ 
within the trai^tion region at step 5113, appUcability k is 

adjusted toward 'V at step S114 in case the chjcd is near the ^ j^j^ case, conversion is not performed in ranges where 

specified region or it is adjusted toward ' 0* in case the object 5 «y^» "y>25(r. 

is apart from the ^)ccified region. If the object is judged not imagp data conversion, it is not required to perform 

to belong to the specified region nor the tranation region, calculation each time. If a himinance value range is *0' to 

^licabilityk is set to *0' at step S115. While a center point *255', a result of conversion is predetermined for eadi 

of chromaticity is specified as exemplified in FIG. 12, there himinance vahie and a conversion table is prepared as shown 

may be provided such an arrangement as shown in FIG. 14. 10 ^ ^ conversion table is usable just 

ReferringtoFIG.14,applicabilitykissetto*l'foran image himinance. In an instance where image data contains 

procesang object corre^nding to radius 'rO% and under elements of himinance, the conversion table can be 

condition that a transition region is provided in a range of Cbntrarily, in an instance where image data contains 

radius W to 'rl', appKcabihty k is graduaUy adjusted indirect elements of luminance, the conversion table 

toward '0' in a circumferential rpgion. 15 ^^q^ be most computer systems, red, green and 

At step SllO, an applicable correspondence relationship is ^^^^ elements of image data arc indicated in gradation of 
judged, and at steps Sill to S115, appUcability k is deter- brightness, i.e., cobr gradation daU (R, 0, B) is used. Since 
mined as mentioned above. According to the' pr^nt pre- gradation daU (R, G, B) does not provide direct values 
fened embodiment, there is provided the unit for judging a luminance, it is necessary to perform conversion to Luv 
correspondence relationship using software processing 20 color space sdieme for determining luminance. This method 
inchiding such procedural steps as steps SllO to S115 however, disadvantageous since a large amount of cal- 
mentioned above and hardware for implementation thereof. culation is required. Therefore, the following conversion 
While applicability k is determined as a result of judgment expression is used, which is commonly adopted in televiaon 
in each image processing in the present preferred ^^^^^ processing for directly determining himinance from 
embodiment, it will be obvious to those skilled in the art that 25 RGgdata: 
an alternative-choice judgment is also feasible in applica- 
tion. y-a3QR40.59C+0,llB (14) 

At step S120, image processing is carried out for each 

object pixel according to applicability k. The following On a principle that linear conversion can be made between 

describes more specific conditions of image procesang in 30 gradation data and luminance y as indicated above. Equation 

practicing the present invention. (9) is applicable to a relationship between original unoon- 

Cbntrast indicates a width of himinance in an entire verted gradation daU (RO, GO, BO) and converted gradation 

image, and inmost cases ofcontrast adjustment, it is deared data (Rt, Gl, Bl). Then, the following expressions are 

to increase a width of contrast Referring to FIG. 15, there given: 
is shown a histogram of statistical luminance distribution of 35 

pixels of a certain image. In case of a solid-line curve of m-aRfM> (15) 

narrow distribution, a difference in himinance among brigjit ci-oGOt* (1*) 
and dark pixels is relatively small. In case of a dot-dashed- 

line curve of broad distribution, a difference in luminance BUoBM (17) 
among bright and dark pixels is relatively large, which 40 

signifies that a width of contrast is increased. Referrii^ to Consequently, the conversion table shown in FIG. 16 is 

FIG. 17, there is shown a graph indicating a luminance appUcable to gradation data conversion, 

conversion operation for enhancing contrast. Assumii^ that The following describes an image processing technique 

the following equation holds for a relationship between for adjusting brightness. As in the above^nentioned case of 

original unconverted hiininance y and converted luininance 45 contrast adjustment, a hi^gram of brightness distribution is 

Y assumed. Referring to FIG. 18, a solid-line curve indicates 

a himinance distribution which has its peak inclined toward 

^-'^ a dark level. In this case, the peak of entire distribution is 

Through amversion under condition "a>l", a difference dnfted toward a bright side as indicated by a broken-line 

between a maximum value of luminance 'y max' and a 50 curve. Referring to FIG. 19, a solid4ine curve indicates a 

minimum value of luminance 'y min* is increased, resulting himinance distribution whidi has its peak inclined toward a 

in a broad distribution of himinance as shown in FIG. 15. In bright side. In this case, the peak of entire distribution is 

this case, it is preferred to determine slope *a' airi offset 'b' shifted toward a dailc side as indicated by a broken-line 

according to himinance distnbutioiL For example, the fol- curve. In these cases, linear himinance conversion as shown 
bwing equatioiK are given: 55 in FIG. 17 is not performed, but y-curve himinance conver- 

. ^^Qx sion is performed as shown in FIG. 20. 

o-255/(y max-y mm) ; In YK:urve correction, entire brightness is increased when 

b~-ay mm or 255-07 max (11) «y<1", and it is decreased when '^l*'. A degree of this 

Under the conditions indicated above, a certain narrow correction can be adjusted gradually by clicking an up-arrow 
himinance distribution can be expanded to a reproducible 60 or down-arrow of the BRIGHTNESS adjustment item on the 

extent. However, if it is expanded to an extreme end of a processing menu area 43 shown in FIG. 7 as many time as 

reproducible range, highlights may be blown out to white or required at step SIOO. 

shadows may be phigged up to black. To prevent this, a As in contrast adjustment, it is also possible to set up a 

non-expandable margin corresponding to a himinance vahie value of y automatically. As a result of our various 
of approx. *5' is provided at each of the upper and lower 65 experiments, it has been found that the following approach 

limits of the reproducible range. Rcsultantly, conversion is advantageous: In a luminance distribution, median *y 

parameters are expressed as foUows: med' is predetermined. If it is less than '85', an imagp of 
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interest is judged to be too dark and y correction is made subtracted from eadi of original components, i.e., it is 
according to a Y value indicated below. functionaUy equivalent to high-pass filteririg. If a high- 

frequency component subjected to high-pass filtering is 
Y=y mcd/8S (18) multiplied by a degree of edge enhancement * Eenhance' and 

5 a result value of this multiphcation is added to "Y(x,y)'*, it 
o'"' means that a high-frequency component is increased in 

, incd/85)'*(i/2) (19) proportion to the degree of edge enhancement 'Ecnhance'. 

V ; Thus, edge enhancement is accomplished. Since edge 

Note, however, that even if "y<O.T*, a value of t is set to 0.7 enhancement is required only for an edge part of an image, 
forcwlly. Unless this kind of limit is provided, a night-scene 10 it is posable to reduce the amount of processing substan- 
image zppczss to be of daytime. If an image is brightened tially by performing only in case that there is a large 
excessively, it becomes whitish entirely, resulting in low diflference between adjacent pixels of image data, 
contrast. In this case, it is preferred to perform sudi a In this case, the degree of edge enhancement 'Eenhancc* 
procesang operation as enhancement of saturation in com- can be adjusted by clicking an up-anow or down-arrow of 
bination. 15 the SHARPNESS adjustment item on the processing menu 

If median *y med'is largpr than *128', an image of interest area 43 shown in FIG. 7 as many times as required at step 
is judged to be too bright and y correction is made according SIOO. Still more, it is possible to set up the degree of edge 
to a Y value indicated below. enhancement *Eenhance* automatically. 

At an edge part of an image, a difference in gradation data 
Y-(^mcd/i28 (20) 20 increases between adjacent pixels. This difference represents 

a gradient of huninanoe, which is referred to as a degree of 
e<^ing. In an image, a degree of variation in luminance can 
Y«(y mcda28)**(iA2) pi) ^ calculated by determining eadi of horizontal-direction 

and vertical-direction vector components. Although an 
In this case, a limit is also provided so that a value of Y is set 25 object pixel in an imagp containing dot-matrix pixels is 
to 13 forcedly even if "y>1-3'' for the purpose of preventing adjacent to eight pixels, a degree of variation is determined 
the image of interest fixim becoming too dark. only with respect to adjacent pixels in horizontal and vertical 

For this y correction, it is preferred to provide such a directions for the purpose of simplifying calculation. Sum- 
conversion table as shown in FIG. 16. mation is performed on length values of respective vectors 
In edge enhancement processing for adjusting sharpness 30 to represent a degree of edging *g' for the object pixel of 
of an image, with respect to original non-danced lumi- interest, and a sum value of edging degree is divided by the 
nance Y of each pixel, enhanced luminance V is calculated niunber of pixels to attain an average value. Assuming that 
as expressed below. the number of pixels is indicated as *E(I)pix', a degree of 
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^larpness *SL* of an object image can be calculated as 
35 expressed below. 



where *Eenhance' indicates a degree of edge enhancement, SL-i\^{r)pix (24) 

and '\Unsharp' indicates unsharp-mask processing for each , , . , r . -1 *u -1 ^ f 

pixel of ima^e data. The foUowiBg describes the unshaip- Id tins case^ as a value of;SL decreases, the degjee of 

mask procesang: Referring to FIG. 21, tbeie is shown an shan.ness becomes lower (blumn^. As a value of SL 
exampk of an 4harp ma^ 60 comprising 5x5 pixels. The « ^ *g«» sharpness becomes higher (clearer 

unsharp mask 60 is used in summation in such a manner that imagmg). . . . , 

acentervahieof'10()'isassignedasaweigbttoaprocessing Smces^nessof an miage depends on a visual se^^^^ 

object pixel ^ Y(x,y)- in dot-matrix imagp data and a weight of an mdmdud person, a degree ^^^^"^^J^^^^ 

corresponding to a value in each array box of the misharp determmed sunilariy using miage data which has optunum 

mask60isaSignedtoeachcircumferentialpixelthereof.In ^5 sharpness attamed ey^mnentaUy. A vahi^^ 

use of the unsharp mask 60, summation is performed based is set as an ideal level of sharpness SLopt , and a degr^ of 

»i« f -™«*oc.;r,«. edge enhancement 'Eenhance' is determmed as expressed 

on the lollowmg expression: ™f 

below. 

Yunsharp0^y) = {\/396)Y,mxY(^ + i,Y + j)^ (23) £enhaiicc^SLopt-a,)*tl/2) (25) 

" ^ere coefScient *ks' varies with a size of image. 

In case that image data contains 'height' dots and 'width' 

In Equation (23), *396' indicates a total vahie of weigjiting dots in vertical and horizontal directions req)ectively, coef- 

factois. For an unsharp mask having a different size, a total ficient 'ks' can be determined as indicated below, 

ofvahies in array boxes thereof is indicated. *Mij' represents 55 fa-mm (bright, width)W (26) 
a weighting factor indicated in each array box of an un^iarp 

mask, and 'Y(x,y)' represents each pixel of image data. In where *min (height, width)' indicates the number of *heigbf 

the unsharp mask 60, 'i' and 'j' indicate coordinates on dotsor the number of 'width' dots, whichever is smaller, and 

horizontal and vertical axes thereof. 'A' is a constant vahie of '768'. It is to be understood that the 

Edge enhancement calculation based on Equation (22) 60 above value has been attained from experimental results and 

provides the following functional meaning: '\Unsharp(x,y)' may be altered as required. BasicaUy, as an image size 

is the result of addition in which a weight assigned to each increases, it is advisable to increase the degree of edge 

circumferential pixel is reduced with respect to a pixel of enhancement 

interest, and accordingly an image is unsharpened through In the above-mentioned fashion, edge enhancement pro- 
processing. Such an unsharpening operation is functionally 65 cessii^ can be carried out in manual or automatic setting, 
equivalent to low-pass filtering. Therefore, "Y(x,y)- The foUowing describes an image processing tedinique 
Yunsharp(x,y)" signifies that a low-fiequency component is for adjusting saturation. In case of saturation adjustment 
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on the ccHnputer 21. Note thai a flowchart exemplified in color part may become satisfactory but colors of pixels on 

FIG. 26 is for color adjustment processing to provide dear any parts other than the flesh color part may be affected 

flesh color. significantly. 

In the color adjustment processing, statistical calculation In the present preferred embodiment, therefore, a ratio of 

is performed on flesh-color-like pixels according to chro- 5 the number of flesh color pixels to the total number of pixels 

maticity of each pixel. As shown in FIG. 6, each object pixel (flesh color ratio) is determined at step S270 for regulating 

is moved for statistical calculation on all the pixels. a degree of color adjustment I>egcees of adjustment of 

First, at step S2lO, chromalicity "x-y" of each pixel is primary colors AR, AG and AB are expressed as shown 

calculated. As in the case of the example in the foregoing below. 

description, flesh color is identified if the following e^qpres- 10 

J — o -r AfWb (RsJdcal-Aj Jive) (55) 
sions are sanstiea. 

a35<r<fl.40 (5) ^ 

,^ Aft-fc {BsJiAeal-BsjAvc) (57) 
a33<y<0.36 (6) 

.-1. ^ ^ Based on these equations, a value of flesh color ratio 'ks' is 

Alslep M20. ins judged wtether or not diromataty j^-jT ^ 

converted according to each pixel of RGB gradation data is 

in a predefined flesh color rar^e. If it is in the flesh color jbKNmnbcf of Bcsh color pixcb/ibtol number of pixels) 
range, statistical calculation is performed on each pixel of 

color image data at step S230. This statistical calculation 20 A degree of color adjustment thus attained is not appUed 

signifies simple addition of RGB gradation data values. The intactly to color image data adjustment In the present 

number of pixels is also counted to determine an average preferred embodiment, a tone curve is prepared using the 

value for pixels judged to have flesh color, which wiU be degree of color adjustment at step S280. FIG. 27 is a 

described in detail later. diagrammatic illustration showing tone curves prepared in 

Thereafter, regardless of whether or not each object pixel 25 the present embodiment, 

is judged to have flesh color, each object pixel is moved at A tone curve represents an input-output relationshqj 

step S240. Thus, the above-mentioned sequence is repeated where RGB gradation data is converted with a degree of 

until it is judged at step S250 that processing for all the enhancement regulated. In an example of 256 gradations 

pixels is completed. On completion of processing for all the ranging from levels '0' to *25S\ a ^line curve is drawn with 

pixels, step S260 is performed to divide statistical result data 30 respect to three identified output value points corre^nding 

by the number of pixels for determining an average value to gradation level gradation level *255' and a certain medium 

(Rs.ave, Gs.ave, Bs.ave). gradation level therebetween. Assuming that medium gra- 

Software processing for "x-y" chromaticity calculation at dation level *64' is taken and output vahies are *0', '64' and 

step S210 and hardware for execution thereof provide the '255', there is a coincidence in an input-output relationship 

unit for judging diromaticity. It is judgpd at step S220 35 even if input values are '0*, *64' and '255', resulting in a 

whether or not chromaticity '^-y" is in a predetermined tone curve being straightened. However, if output value '64' 

object range, arxl if the predetermined object range is is not provided for input value * 64', a gpntle curve as shown 

satisfied, statistical calculation is performed on color image in FIG. 27 is drawn to set up an input-output relationship. In 

daU at step S230. Then, at steps S240 and S250, each object the present preferred embodiment, a control point corre- 

pixel is moved untfl all the pixels arc taken. At step S260, 40 ^nding to the average value (Rs.av6, Gs.ave, Bs.ave) of 

statistical result is divided by the number of pixels to RGB gradation data is used as a medium gradation level, and 

determine an average value. These software processing reactive degrees of color adjustment AR, AG and AB are 

operations and hardware for execution thw»f provide the reflected in formation of a tone curve. In this fashion, the 

unit for statistical calculation of object chromaticity pixels. control point is changed so that each ideal vahie (Rsideal, 

As to pixels having preferable flesh color, an ideal value 45 Gsideal, Bs.ideal) is met the flesh color ratio 'ks' is 

(Rs. ideal, Gs.ideal, Bs. ideal) is predefined. In terms of '1'. 

memory color in psychok>gy, eadi ideal vahie is different At step S290, element colors of color image data are 

&Dm result of actual measurement In an example of flesh converted for all the pixels again using a tone curve thus 

color, a person tends to have an optical illusion that slightly attained to accomplish color adjustment of the color image 

deviated flesh color is true rather than fle^ color conform- 50 data. 

ing to actual measurement result. This optical illusion is Software processing at step S270 where color adjustment 

based on a stereotyped rea^nition of flesh color on photo- is made according to a ratio of the number of object pixels 

graphs and pictures, i.e., it is referred to as a memory cotor to the total number of pixels while determining a difference 

effect in psychology. In the present invention, an ideal value between a sUtistical result value and an ideal value, software 

is predefined in consideration ofsuch a memory color effect 55 procesar^ at step S280 w^re a tone curve is formed 

so that color adjustment is made to eliminate a difference according to a determined degree of color adjustment, and 

from the ideal value. Therefore, the ideal value may be in a hardware for execution thereof provide the unit for judging 

wide target rar^e e3q;)ccted without being biased to actual a degree of color adjustment Software processing at step 

color. S21M) where color image data is converted and hardware for 

Regarding flesh color pixels, a difference between an 60 execution thereof provide the unit for adjusting color, 

average value (Rs.ave, Gs.ave, Bs.ave) of RGB gradation Having described the present preferred embodiment as 

daU and an ideal vahie (Rs ideal, Gsidcal, Bs^ideaQ pre- related to flesh color adjustment for the purpose of ampUc- 
defined for preferable flesh color represents a degree of ity in explanation, it is to be understood that color adjust- 

deviation in color image data fundamentally. ment is not limited to flesh color. In consideration of a 

However, it is not preferred to apply the difference as a 65 memory color effect in psychology, it is often desired to 
degree of color adjustment intactly. For instance, if the same attain more vivid green color of tree leaves and clearer blue 
degree of color adjustment is applied to all the pixels, a flesh color of sky in addition to clearer flesh color through color 
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adjustment piocesang. Referring to FIG. 28, there is shown caloiUtion result, a difference between it and an ideal value 

a modified embodiment in which an object of adjustment is predefined for preferable color is determined, and multipli- 

selectable cation by a flesh color ratio, green color ratio or bhie color 

hi the example shown in FIG. 28, an object of color is performed for regulating a degree of each color 

atfustment is selected first at step 8305. In use of the 5 adji^enL . 

aujuocuivu r „„ Ui)on completion of step S365, there are provided three 

commiter 21, a wmdow shown m FIG. 29 is presented on the wmpi«>Liuu ♦ • « ^1™^ «f a^u 

TJ f ^ ,K * ««*.,^tL /.^n »fi kmds of degrees of color adjustment since degrees of flesh 

display momlor 32 so that a human operator can setoan ^^^^ adju^enl, green color adjustment and blue color 

objcctof color adjustment m HG. ^ave b<in determined through respective static 

29 IS provided with a flesh color adjustment item for clearer calculations. Therefore, in the modified embodiment, 

flesh color, a green color adjustment item for more vivid 10 ^^^^^ ^ performed to reflect results of these statistical 

green color of tree leaves, and a blue color adjustment item calculations incluavely. More ^jedfically, in a situation of 

for clearer sky blue, each of which has a check box for application of procesang objects, degrees of color 

allowing individual selection. In this example, duplicate acjustment for respective processing objects AR, AG and 

selection is also permitted. When the operator turns on a B determined as expressed below, 
desired check box and click the *0K' button, a flag for each 15 

object thus specified is set up to start a loop processing of hr-Y kiiRi- ideal -Ri av^t) 
steps S310 to S350. 

In the loop procesang, while moving an object pixel, 

statistical calculation is performed on each pixel through AC-Y^k^^i ideal-Gi ave) ^ 

determining chromaticity as in the foregoing description. At 20 * 

step S310, object pixel chromaticity "x-y" is determined Vi.^»- , «• («) 

uang Equations (1) to (4). Then, at step S315, a flesh color ^ = l^hiBi- ideal -Bi-ave) 

adjustment flag which has been set up at step S305 is 

referenced to judge whether or not the operator has selected 

the flesh color adjustment item. If it has been selected, 25 where 

statistical procesang for flesh color pixels is performed. This 

statistical procesang is carried out in the same maimer as at ^ti <= l ^^'^ 

steps S220 and S230 in the previous example. If chroma- > 

ticity "x-y" determined at step S310 is in a predefined i = flesh color 

possible chromaticity rangp corre^nding to flesh color, 30 

statistical calculation is performed for each element color of ' - 2: green color 

RGB gradation data. ,-_3. bhic color 

At step S325, a green color adjustment flag is referenced 
to judged whether or not the operator has selected the green 

color adjustment item in the same maimer as for flesh color 35 in this case, it is assumed that no duplicate counting is made. / 

adjustment If the green color adjustment item has been At step S370, a tone curve is formed according to each of? 

selected, object pixel chromaticity "x-y" is checked to judge degrees of color adjustment AR, AG and AG thusj 

whether or not it is in a predefined possible chromaticity determined, as shown in FIG. 27. hi this case, control points 0 

range corrc^nding to green color of tree leaves. If it is in are indicated by 2 ki-Ri.ave, 2 ki Gi.ave, Z ki Bi.ave.|| 
the corre^nding predefined chromaticity range, statistical 40 Software processing operations at steps S355 to S370 pro- ; j 

calculation is performed at step S330. This statistical cal- vide the unit for judging degree of color adjustment After , 

culation is carried out in an area different from that subjected formation of each tone curve, color image daU is adjusted at\ - 

to flesh color statistical calculation. step S375. 

Then, at step S335, the bhie color adjustment item is The above-mentioned color adjustment device may also 
checked to form a jud^ncnt in the same manner, and at step 45 be implemented as a printer driver. In most cases, a printer 

S340, statistical calculation is performed on another area. driver is not capable of temporarily ^ring daU in an output 

At'step S345, each object pixel is moved, and the above- process after processing of input data. Heiice, there is a 

mentioned sequence is repeated until it is judged at st^ certain limitation in functionality for changing procesang 

S350 that procesang for all the pixels is completed. In this conditions according to each region divided as desired, 
modified embodiment, a plurality of objects may be selected 50 However, by setting up degrees of color adjustment for a 

for color adjustment Even in such a situation, statistical phirality of elements as shown in Equations (58) to (60% it 

calculation is performed at steps S315 to S340 if chroma- is possible to carry out effective color adjustment even for 

ticity of each object pixel is in a predefined chromaticity the printer driver having such a fimctional limitation, 

range. Therefore, these processing operations provide the The foUowing describes operations of a preferred embodi- 
unit for statistical calculation of object chromaticity pixels. 55 ment arranged with a printer driver. 

At steps S355 to S365 after completion of chromaticity As in the previous exemplary embodiment, it is assumed 

statistical calculation on all the pixels, a degree of adjust- that a photogr^hic cotor image shown in FIG. 30 is read in 

ment for each color is calculated according to result of uang the scanner 11 and printed out using the printer 31. 

statistical calculation. Unlike the previous example, a pro- First, under condition that the operating systein 21fl is run on 
cessing operation for determining an average value from 60 the computer 21, the cotor adjustment application 21d is 

statistical calculation result is performed amultaneously launched to let the scanner 11 read in the photc^phic 

with calculation of a degree of each cotor adjustment in this image. When the photc^aphic image thus read in is taken 

modified embodiment, and it is possible to modify relevant into the color adjustment application 2U under control of 

calculation procedures as required. As described in the the operating system 21fl, an object pixel applicable to 
previous example, each adjustment of flesh color, green 65 processing is set at an initial position. Then, at step S210, 

color and bhie color is carried out in the same manner. That chromaticity "x-y" of each pixel is calculated using Equa- 

is, an average vahie is calculated according to statistical tions (1) to (4). At step S220, it is judged whether or not each 
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of values *x' and 'y' is in a predefined Qssh color chroma- 
ticity range. If it is in the flesh color chromaticity range, 
statistical calculation is performed on each pixel of color 
image data for each element color at step S230. In the 
photographic image shown in FIG. 30, pixels of pecson's 
hands, leg^ or face can be judged to have flesh color. In this 
example, ^tistica) calculation is performed on a few per- 
cent of all the pixels as flesh color pixels. Then, at step S240, 
each object pixel is moved. Thus, the above-mentioned 
sequence is repeated until it is judged at step S250 that 
processing for all the pixels is completed. After completion 
of processing for all the pixels, statistical result data is 
divMed by the number of flesh color pixels to detemoine an 
average value at step S260. At step S27D, a difference 
between an ideal value of flesh color and the average value 
of flesh color pixels is determined, and it is multiplied by a 
flesh color ratio that represents a ratio of the number of flesh 
color pixels to the total number of pixels. At step S280, a 
tone curve is formed accordingly. Then, at step S25N), based 
on the tone curve, each element color of color image data is 
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Through the above-mentioned processing, the photo- 
graphic color image read in using the scanner 11 is auto- 
matically subjected to optimum color adjustment. 
Tliereafter, it is di^layed on the display monitor 32, and 
then printed out onto the printer 31. 

As set forth hereinabove, on the computer serving as the 
nucleus of color adjustment, chromaticity "x-y" of each 
pixel is calculated at step S210, and statistical calculation is 
performed at steps S220 to S230 if a value of chromaticity 
thus calculated is in a chromaticity range predefined for each 
color. After completion of statistical calculation on all the 
pixels^ an average value is determined at step S260, and a 
degree of each color adjustment is calculated while taking 
account of an occupancy ratio of object pixels of each color 
at step S270. In th^ fashion, accurate statistical calculation 
is peiformed on color pixels to be adjusted independently of 
brightness, and a degree of each color adjustment is regu- 
lated by taking account of the number of pixels of each color 
in terms of occupancy ratio, thereby making it possible to 
carry out optimum color adjustment processing without 



» giving an adverse effect on <x>lors of pixels surrounding 
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adjustment is set at a moderate level with respect to the ideal 
value of fle^ color in consideration of the ratio of the 
number of flesh color pixels to the total number of pixels, 
preferable color can be attained through proper color adjust- 
ment 

In the example shown in FIG. 30, a difference between an 
average value of flesh color pixels attained in statistical 
calculation and an ideal value of flesh color is multq>lied by 
a flesh color ratio indicating a few percent value for regu- 
lating a degree of color adjustment According to the regu- 
lated degree of color adjustment^ a tone curve is formed for 
accomplishing color adjustment 

Still more, if a flesh color adjustment item, green color 
adjustment item and blue color adjustment item are selected 
in adjustment object selection as exemplified before, chro- 
maticity "x-y" is calculated for aU the pixels at step S310. 
Then, at steps S315 to S340, individual statistical calculation 
is performed for each adjustment object In the example 
shown in FIG. 30, on a flesh color part of a person's image, 
a green color part of tree leaves and a dcy-blue part of a 
background, each chromaticity '^-y" is applicable to each 40 
object range for statistical calculation. 

After completion of processing for all the pixels, a degree 
of color adjustment for each object is determined in con- 
sideration of an occupancy ratio of object pixels at steps 
S355 to S365. At step S370, a tone curve is formed through 
regulating each degree of color adjustment thus determined. 
Then, at step S375, color adjustment is carried out for all the 
pixels of color image data. In this manner, flesh color 
acyustment for attaining clearer flesh color, green color 
adjustment for attaining more vivid green color of tree 50 
leaves, and bhie color adjustment for attaining clearer sky 
bhie in badcground are carried out through regulation 
according to the occupancy ratio of object pixels of each 
color. 

Afler color adjustment thus accomplished, a color image 55 
is displayed on the flisplay monitor 32 through the dt^lay 
driver 21c, and then if the color image thus displayed is 
satisfactory, it is printed out onto the printer 31 through the 
printer driver 21^. More spedficaUy, the printer driver 21^ 
receives RGB gradation image data which has been sub- 60 
jected to color adjustment, performs resolution conversion 
as predetermined, and carries out rasterization accordii^ to 
a print head region of the printer 31. Then, the image data 
thus rasterized is subjected to RGB-to-CMYK color 
conversion, and thereafter, CMYK gradation image data is 
converted into binary image data for output onto the printer 
31. 
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object pixels. 

The invention may be embodied in other specific forms 
without departing firom the spirit or essential characteristics 
thereof. Hie present embodiments are therefore to be con- 
sidered in all respects as illustrative and not restrictive, the 
scope of the invention being indicated by the appended 
claims rather by the foregoing descrq>tion and all changes 
which come within the meaning and range of equivalency of 
the claims are therefore intended to be embraced therein. 

What is claimed is: 

1. A color adjustment device for performing color sepa- 
ration of color image data for each predetermined element 
color and adjusting said color image data in enhancement to 
provide each desired color in result of color image output 
delivered for each element color by an image output device, 
comprising: 

a chromaticity judging unit which determines a value of 
chromaticity of each pixel according to said color 
image data; 

an object chromaticity pixel statistical calculation unit 
li^iich performs statistical calculation on pixels having 
chromaticity values determined, by said chromaticity 
judging unit, to meet a predetermined range of chro- 
maticity; 

a color adjustment degree judging unit which determines 
a degree of color adjustment so as to eliminate a 
difference between a predetermined optimimi value for 
pixels meeting said predefined range of chromaticity 
and a result value attained in said statistical calculation 
and for regulating said degree of color adjustment 
according to an occupancy ratio of pixels subjected to 
said statistical calculation to the total number of pixels; 
and 

color adjusting unit which carries out color adjustment of 

said image data accordii^ to the regulated degree of 

color adjustment; 
wherein, for each given pixel of the color image data: 

said duomatidty judging unit determines a value of 
chromaticity of said given pixel, 

when said judged value of chromaticity of said given 
pixel meets a predetermined rangp of chromaticity, 
the object chromaticity pixel statistical calculation 
unit performs a statistical calculation with respect to 
the given pixel; 

after said statistical calculation, said color adjustment 
degree judging unit determines a degree of color 
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Abstract 

We propose a fast face detection algorithm that works di- 
rectly on the compressed DCT domain. Unlike the pre- 
vious DCT domain processing designs that are mainly 
based on skin-color detection, our algorithm analyzes 
both color and texture information contained in the DCT 
parameters, therefore could generate more reliable de- 
tection results. Our texture analysis is mainly based on 
statistical model training and detection. A number of 
fundamental problems, e.g., block quantization, prepro- 
cessing in the DCT domain, and feature vector selection 
and classification in the DCT domain, are discussed. 
Key words: face detection, DCT, JPEG, MPEG. 

1 Introduction 

Human face detection is zua interesting research topic. 
In the Uterature, many works have been reported with 
different appUcation backgrounds. They could be clas- 
sified into two groups, i.e., color based approaches and 
texture based approaches. 

Color based approaches have been populsu: in mul- 
timedia community because of their relative simple de- 
sign and fast performcince. In general, this type of algo- 
rithms tries to model human skin color in different chro- 
matic spaces (RGB, YCbCr, HSV, etc.) with various 
statistical models, TypiczJ skin color modeling works 
include: mixture of Gaussian [13], linear region approx- 
imation [7], Bayesian minimal cost decision rule [18], 
etc. In addition to color modeling, a number of recent 
works seek to include additional heuristics such as tex- 
ture, symmetry, region ratio, region segmentation and 
merging, etc. [2, 19, 1]. In order to further improve 
the processing speed, Wang and Chang [18] proposed to 
detect human faces directly on the compressed MPEG 
macroblocks. In their work, JPEG pictures amd MPEG 
I frames are partially decoded (entropy decoded and 
de-quantized) to restore DCT parameters in their block 



structure. Their algorithm then works directly on the 
decoded DCT parameters. Color information is used 
as the major detection clues in their algorithm. A skin 
color model is created at the macroblock level in the 
YCbCr color space. In addition, they also use some 
texture information by grouping the DCT paramaeters 
into bins and evcduating the energy distribution pat- 
terns based on bin statistics. This algorithm has inter- 
esting fast performance, but it shares the same problem 
with those typical color-based adgorithms designed in 
the pixel domain, i.e., it has good detection rate, but 
suffer from high false alztrm rates when the backgrounds 
have skin like colors. 

In contrast, texture based works detect face directly 
from the picture grayscale information. Most texture 
based methods are* developed from face recognition al- 
gorithms, because in the recognition sense, color infor- 
mation does not help much in distinguishing different 
individual's faces. Instead, it is the face texture that 
we use to recognize different persons. Typical exzma- 
ples ^ based on this observation includes Sung and Pog- 
gio [16]' system developed at MIT AI lab and Rowley 
and Kzinade [15] 's system developed at CMU Robotic 
Institute. Both systems use similar preprocessing al- 
gorithms and multi-scale seaurching mechanism, except 
that Sung and Poggio's system uses multimodal Gaus- 
sian clustering method as classifier while Rowley and 
Kanade's system uses neural network as classifier. Simi- 
lar systems available in the literature also include: Lew [9]'s 
work that uses information theory, CoUobert et al. [5]'s 
and McKenna [10] 's works that use neural networks, 
Yang amd Ahuja [20] 's work that uses factor aucilyzer 
and Osuna et al. [14] 's work that uses support vec- 
tor m2u:hine (SVM) as classifiers. They cdl reported 
close detection performance on some common testing 
pictures, for example the CMU testing database. In 
general, texture-based algorithms are more reliable than 
color based algorithms, but they aie also more complex 
and slower. 

In this paper, we propose a face detection algorithm 
that combines both color and texture information in or- 
der to find a good balance between speed and detection 
reliability. We design the algorithm to work on the com- 

^We do not intend to give a comprehensive survey of face 
detection algorithms. Instead only a group of similar algorithms 
that our work b£ises on is covered. 



pressed DCT domaun in a similar way as Wang [18] 's 
work. However, our work extends theirs by building up 
an independent texture-based detection module in the 
compressed DCT domain, i.e., we study how to map 
the successful texture-based face pattern detection algo- 
rithms such as [16, 15, 20, 14] from the traditional pixel 
domain to the DCT transform domain. With the new 
texture-based detection model included, face detection 
problems Ccm be solved more reliably in the compressed 
domain, as compared with Wang and Chang [18] 's work. 
We believe this algorithm is especially valuable for fast 
content analysis of large amount of visual media data 
stored in the compressed formats such as JPEG and 
MPEG. In the following of this paper, when mention- 
ing compressed DCT domain, we refer to JPEG pictures 
and MPEG I ftcimes that are partially decoded (entropy 
decoded and de-quantized) and have their DCT param- 
eters restored in 8 pixel by 8 pixel block structures, if 
not expressed clearly otherwise. 

The structure of this paper is as follows. First in 
Section 2, we give a simple survey of the texture- based 
face detection design in the pixel domain. In Section 3 
we discuss in detail the texture-based face detection 
algorithm design in the compressed DCT domain. In 
Section 4, the texture- based face model is extended to 
include color information and a new combined texture- 
based and color-based face detection cdgorithm is devel- 
oped with experimental results. Finally in Section 5 we 
conclude the paper. 

2 Texture Based Face Detection in the Pixel Domain 

In this work, we seek to map the successful texture- 
based face detection algorithms from the original pixel 
domain to the transformed DCT domain. Therefore, 
before going directly to the DCT domain, we first tcike 
a close look at the available works in the pixel domain. 

In the available face detection works designed on the 
pixel domain, successful algorithms such as [16, 15, 20, 
14] generally share similar processing procedures and 
structures. In these works, the face pattern is repre- 
sented as a rectemgle or square window of pixel plsine. 
In order to detect face patterns, the systems scan the in- 
put image plctne at every location. That is, the image is 
divided into multiple (possibly overlapping) subimages 
of the model window size. Eax:h window is compared 
with a previously trained "face" model to tell whether 
it is a face pattern or not. In order to detect faces in 
multiple scales, the input image is downscaled to a se- 
ries of scales (for example, by scales of 1.2", n = 1, 2, . . .) 
and the detection process is repetitively applied to each 
scale. Or in other words, to tell a windowed image a 
face or not, the window is always first scaled to the size 
of the model, and then compzired with it. 

The details of this processing flow chart axe illus- 
trated in Fig. 1. As we can see from Fig. 1, to detect 
faces, the input image is scaled and cropped to gen- 
erate a windowed image pattern. The pattern is pro- 
cessed with a preprocessing module to remove illumi- 
nation noise and normalize the grayscale ranges. The 
normalized image pattern is then fed to a classifier to see 
if it is a face pattern. To create a face model, a database 
of frontal faces is generally used. Each face is scaled and 
moved to align its common facial feature points such as 
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Figure 1: Illustration of common face detection proce- 
dures in the pixel domain 



corner of eyes, tip of nose, etc. to specified positions 
in the model window. The windowed image pattern is 
then processed by a preprocessor, and finally appUed to 
train a classifier. 

In all the four detection systems [16, 15, 20, 14], 
their basic structures are generally the same (as indi- 
cated in modules covered by the dashed rectzmgle in the 
left (Group A) of Fig. 1). The only different part is the 
classifier, i.e., classifier's internal structure, its training 
and detection operation (as indicated in modules cov- 
ered by the dashed rectangles in the right (Group B) of 
Fig- 1. 

Based on the summery as depicted in Fig. 1, the 
detection complexity is M times of the processing in 
Group A plus the processing in Group B. M depends 
on the image size, the model size, and the range of the 
face sizes that axe to be detected by the system. 

However, the essence of the complexity issue is the 
size of the face model, which determines directly the 
complexity of the classifier, i.e., how complex the clas- 
sifier should be, in order to separate the face patterns 
from the nonface patterns. Though there is no clear 
measure available to judge the complexity of face pat- 
tern classification, most reported papers use similar face 
model sizes. For example, Sung [16] uses a (masked) 
19-pixel by 19-pixel square window, Rowley [15] uses a 
(masked) 20-pixel by 20-pixel square window, and Col- 
lobert ei al. [5] uses a 15-pixel by 25-pixel rectangle 
window. Therefore, the face detection problem is (at 
most)^ a 2-dimensioncil pattern classification problem 
at the size of about 20 pixels by 20 pixels. If we stack 
up the pixels row by row as in [16], the problem is then 
converted to a 1-dimensional pattern classification prob- 
lem at the size of about 200-400 dimension. 

3 Texture Based Face Detection in the DCT domain 

3.1 Feature Representation in the Block DCT Do- 
main 

3.1.1 The DCT Transform 

In principle, pattern detection models should not be in- 
fluenced by converting the problem to the DCT domain 

^Sung [16]'s work indicates that the classification problem can 
be projected to subspaces in much lower dimensions. But there is 
no clear quantitative boundary on how low this dimension could 
be. 
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Figure 2: Illustration of the block quantization problem 
in face detection. 



if we do not consider the blocks and quantization er- 
rors incurred. Because DCT is an orthonormal trains- 
form, both the Euclid distance and the Mahalanobis dis- 
tance are unchanged after the transform. If we follow 
the Gaussijua clustering approach as discussed in [16], 
it is easy to proof that the Gaussijui model remains a 
Gaussian model in the DCT domain. In addition to 
the invariant features, the DCT domadn is better than 
the pixel domain for pattern classification problems in 
that DCT transform reduces the dependence among in- 
dividual components and compresses the feature energy 
to the low frequency parameters. Therefore, it is much 
easier to choose feature components from DCT param- 
eters than from direct pixel values. 

3.1.2 Block Quantization Problem 

However, to apply model-based algorithms directly to . 
the compressed domain of JPEG and MPEG, a major 
problem to overcome is that the image frames cure di- 
vided into 8 by 8 blocks before DCT transform. There- 
fore, any detection work based on DCT parameters has 
to be done at the locations of blocks rather than pixels. 
That is, the blocks reduce the spatial resolution of the 
system by 8, which makes it hard to detect small faces 
without fully decoding the image from the block based 
DCT parameters to pixels. We refer to this problem as 
block quantization in this work. 

More specifically, the block quemtization problem in- 
fluences face detection in the following three aspects. 

First, because the DCT parameters are organized 
in a block structure, in order to detect a face, a face 
model cannot be used to search the picture pixel by 
pixel. Instead, this search can only done block by block 
in the DCT domain. This problem is better illustrated 
in Fig 2, in which the rectangle in light solid lines repre- 
sents the actual location of a face, while the rectangle in 
dark solid lines is the closest searching window that the 
system can arrive at based on 8 by 8 block quantization. 
Therefore, in order to detect faces that are not aligned 
with block positions, we need to introduce certain trans- 
lation invzurizmt feature in face model training. Or in or- 
der words, when training the face model, more images 
patterns should be included as positive training exam- 
ples than the corresponding procedure that designed in 
the pixel domain. 

Second, in addition to the feature aligning problem, 
block quantization also introduces some background noise 
when the searching window is not well aligned with the 
actual face region, which influences the accuracy of both 



Figure 3: Illustration of the feature vector creation from 
DCT pzurameters 



model training and detection. 

Third, in the block-based DCT domain, it is hard to 
obtain resolution trzmsformation. Though several pa- 
pers [8, 11, 4] have discussed the issue of fast resolution 
transform in the compressed DCT domain, it is still too 
expensive to carry out resolution transforms in arbitrary 
ratios, e.g.^ to down-sample the image by 1.2. There- 
fore, in order to detection faces in multiple sccdes, we 
have to design models individually for each scale. We 
ceJI this solution as a multi-model approach in this work, 
as compared to the multi-scale approach commonly used 
in the original pixel domain. 

To sum up, the block quatntization reduces the distri- 
bution density of the feice patterns in the feature space, 
as well as introduces bcickground noises and the scal- 
ing problem. Therefore, to detect human face patterns 
within the block based DCT domain is more difl^icult 
than to do it in the pixel domain. 

3.1.3 Feature Vector Design 

In this work, we design face detection as a 1-dimensioncil 
vector classification problem similar to Sung [16]*s sys- 
tem. In the DCT domain, feature vectors are created 
directly from (block based) DCT parameters as follows. 
Suppose the size of the face model is M block by M 
block and the desired length of the feature vector is iV, 
then in each DCT block, the lowest d DCT parameters 
are used for feature vector creation, where 

d = 7V/(M * M). 

This is better illustrated in Fig. 3, in which the lowest d 
DCT parameters from each block is stacked up to form 
a N dimension feature vector. 

In order to choose the first few low frequencies from 
a 2-D DCT block, we number the 64 DCT parameters 
according to the typiccd DCT quantization table design 
as reported in [17], i.e., the lau-ger the qucintizer is, the 
less important its corresponding DCT paraimeter is for 
low frequency representation of the picture. In Eq.l, we 
show the positions of the first 16 parameters. 



(1) 



Based on the complexity analysis in Section 2, the 
complexity of face detection problem should be pro- 




3.2 Face Detection System Design 

Based on the analysis in Section 2, a multi-model DCT 
domain faice detection system is designed that combines 
algorithm capabilities and performance efficiencies. 

3.2.1 Multi-Model Detection System 
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Figure 4: Illustration of the face detection procedures 
in the DCT domain. 
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Figure 5: Illustration of the face model training proce- 
dures in the DCT domain. 



cessed adequately over a rectangle window of 19 pixel 
by 19 pixel, therefore, the feature vector length in this 
work should be in the range of 200-400 dimension. For 
example, if a 5 block by 5 block face model is to be cre- 
ated, and we choose the feature vector length to be 300, 
then for each DCT block, its lowest 12 DCT parame- 
ters are used to create the feature vector for training 
and classification. 

In principle, this approach of choosing the lowest 
few DCT pcurameters in each block is actually to down- 
scale the windowed image to a unit model size, saying 
19 pixel by 19 pixel, as used in the pixel domain. In 
other words, to tell a windowed image pattern a face 
or not, in the pixel domain, the face is downscaled to 
the unit model size, e.^., 19 by 19 pixel, and compared 
with the face model. In the DCT domjiin, the corre- 
spondence is that the low frequent DCT parameters in 
each block of the windowed image pattern are used to 
form a feature vector, which is then compared with the 
face model. In this mapping, as long as the lowest few 
DCT parameters have mciintained the image features 
at the resolution of 19 by 19 pixels (approximately)^, 
we have reason to beUeve that those parameters have 
maintain the necessary information for face detection, 
based on our complexity analysis in Section 2. The ben- 
efit in this domain mapping is that unlike the bilinear 
downsampling in the pixel domain, choosing the lowest 
few DCT parameters is an easier and better way to do 
downsampling in the DCT domain. This is also noticed 
and discussed by Dugad and Ahuja [4] in their paper on 
fast DCT domain downsampling design. 

^That is, if the image is decoded with only these low frequent 
parameters and then downscaled to 19 by 19 pixels, the image 
quality is still comparable with direct downloading the original 
image in the pixel domain. 



Due to the block quantization problem as discussed in 
Section 3.1.2, model-based face detection in the block- 
based DCT domain faces two problems, i.e., aligning 
problem and seeding problem. 

For the aligning problem, because the block size is 8 
pixels, there are totally 64 possible spatial setups at the 
every searching position in the DCT domain. To solve 
the aligning problem, one might train 64 models for one 
model size, with each one representing a face pattern 
at a different aligning position. However, this approach 
is obviously not efficient because 64 models are hard to 
store as well as to apply. In addition, there are redun- 
dancies in the 64 models as the spatially neighboring 
models are similar to each other. Therefore, the trade- 
off has to be made between model efficiency aind model 
accuracy. In addition, to overcome the seeding problem 
in the DCT domain, multiple models have to be created 
in order to detect faces in different scales, which further 
improves the burden of choosing too many models in 
one scale. 

In this system, we create face models in six scales, 
and the model windows are designed to be squares in 
side length of 5, 6, 7, 8, 9, and 10 DCT blocks. That is, 
faces in the range of 40 by 40 pixels to 80 by 80 pixels are 
covered by the system. For each scale, one face model 
is trained for faces in all the possible aligning positions, 
i.e. the model represents vctriations in different face 
features as well as in different aligning positions (all the 
64 possible positions). 

The processing procedures for each model size is gen- 
erally the same as those in the pixel domain (refer to 
Fig. 1), except that the works are moved to the DCT 
domain. We illustrate the DCT domain detection proce- 
dures and model training procedures separately in Fig. 4 
and Fig. 5. The details of the processing modules are 
discussed in the following sections. 

3.2.2 Preprocessing and Masking in the DCT Domain 

To remove the signal variance introduced by different 
illuminations and different grayscale dynzimic range, a 
number of preprocessing steps axe used in the works 
designed in the pixel domain. In the DCT domaun, we 
find it also possible to implement their correspondences. 
More specificcdly, our system includes the following pre- 
processing steps: 

1. Masking. Similar to Sung [16] 's work, we intro- 
duce binary face masks on top of DCT blocks. For 
face patterns, these masked blocks often represent 
backgro\md regions. Removing them from the fea- 
ture vectors ensures that the subsequent modeUng 
work does not wrongly encode any unwanted back- 
ground structures. Based on different sizes of the 
feice model, different meisks are designed to rep- 
resent different spaticJ resolutions. In Fig. 7, we 
show the six masks we used in our system for face 
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models in six different scales. Note that the face 
models are actually in different sizes. 'They axe 
scaled to the same size in Fig. 7 for ease of illus- 
tration. Each block in the model windows is of 
size 8 pixel by 8 pixel. 

2. Illumination Linear Factor Correction. In 

order to remove shadowing effect, a 2-dimensional 
linear function is fitted to the DC plane of the face 
region in the DCT domain. The fitted function is 
then removed from the DC plane. 

3. Histogram Equalization. This nonlinecir pro- 
cess is applied directly to the DC components of 
the face region DCT blocks. In addition, the AC 
components in each block are changed linearly to 
reduce the block effects. That is, for each DCT 
block, if its DC component d is mapped to d' ac- 
cording to the histogram equalization, then its AC 
components at are also mapped linearly to with 
a'i = ai * d' Id. 

Fig. 6 shows cm example of preprocessing (the pic- 
ture is taken from Olivetti face database^). Fig. 6(a) 
is a cropped face region, (b) is the face region with a 
grayscale linear factor removed, and (c) is the fined re- 
sult of face region preprocessing. We can see that the 
shading in the originsJ face is effectively removed (in 
subfigure (b)). In addition, the histogreim equalization 
based on DC components introduces certetin blocking ef- 
fect, but the general quedity is acceptable (in subfigure 
(c)). 
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Figure 6: An example of preprocessing results on block 
based DCT domain 
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Figure 7: Illustration of binary face masks design for 
models in different sizes. 



multimodal Gaussian model as the classifier, which is a 
trade-off between model training complexity and classi- 
fication performance. 

The essence of multimodd Gaussian model is to ap- 
proximate the feature vector distribution with a number 
of Gaussian clusters. Though Gaussian distribution is a 
general purpose statistical model that has been exten- 
sively used, unimodel Gaussiaoi distribution is shown 
by Sung [16] inadequate to model face patterns' dis- 
tribution in the high dimensional feature spzice. Mul- 
timodal Gaussian model is a natural extension of uni- 
modzil Gaussian models that has worked well in complex 
and high dimensional distribution problems. It also has 
moderate training complexity as compared with neural 
networks and SVMs. In this work, in order to improve 
classification performance, we use two groups of multi- 
modal Gaussian models to approximate separately the 
distribution of face patterns and nonface patterns. 



3.2.3 Distribution Based Classifier Design 

For face detection purpose, a number of classifiers have 
been used. These include: unimodel Gaussian [12], mul- 
timodel Gaussian [16], neural networks [15, 10] and sup- 
port vector machine [14]. Among them, neural networks 
and support vector machine have been shown to have 
theoretical merits for high dimensional vector classifi- 
cation problems. Especially SVM is proved capable of 
finding optimcd classification boundary in that it Ccui 
minimize structural risk [3]. However, both neural net- 
works and SVM are hard to train, i.e., to orgcinize the 
positive and negative training samples in order to make 
the classifier converge to the optimal status. 

In this work, we have to train multiple face models 
for faces in multiple scales, therefore, we choose to use 

* http://www.cam-orl.co.uk/facedatabase, html 



Positive Training Samples The basic idea of this ap- 
proach is to approximate the distribution of face feature 
vectors with high dimensionzd Gaussian clusters. To 
study the distribution of face patterns in the DCT do- 
main, 300 frontal faces are collected firom various sources, 
such as MIT face database, Yale face database, Olivetti 
face database, University of Sterling face database* and 
some anchorperson images firom the NBC news video 
database stored at AT&T research lab. Four feature 
points of each face, i.e., the inner and outer corners of 
both eyes, are msirked manually. Each faice is moved 
and scaled to align these feature points with specific 
positions in a model window. The face image is then 
cropped with the model window, and converted to the 
DCT domain. After that, the feature vector is obtained 

*Most of them are downloadable from 
http://www.cs.rug.nl/'peterkr/FACE/face.html 



from the DCT parameters as described in Section 3.1.3. 
Because we train only one model for one modeling win- 
dow size, featm*e vectors should also be created to repre- 
sent face patterns whose feature points are not perfectly 
zdigned with the face model. In this work, we use 16 out 
of 64 possible spatial aligning positions for each trziin- 
ing face to generate training samples. In addition, to 
make use of the symmetric property, each training face 
is flipped and used as a new training sample. Therefore, 
in this system, totally 16*2*300 = 9600 feature vectors 
are used as positive training samples. 

Clustering Algorithm We cluster the face feature vec- 
tors into six clusters of Gaussian distribution. The clus- 
tering algorithm used here is similar to the K-means 
algorithm, except that the Euclidean distance measure 
is replaced by a logarithmic Gaussian distance measure. 
If we denote each cluster with a Gaussian distribution 
Ar(vj, Ci). A new feature vector's distance to each clus- 
ter is defined to be a logarithmic Gaussian distance as: 

d = i(iVln 27r + In |C<| + (v - vO^C-'(v - Vi)), 

where N is the dimension of the feature vectors. Be- 
cause N is a rather high dimension in this problem 
(200 ^ 400), we actucdly use Karhunen-Loeve trans- 
form to reduce the above equation into a lower dimen- 
sion problem as: 
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where yl are the principle components and are the 
eigenvalues. M is the number of eigenvectors used to 
approximate the system. Generally M <^ N. 
The detailed clustering steps are as follows. 

1. Initialize the clustering process by grouping the 
feature vectors into six groups in Euclidean dis- 
tance space, i.e., a vector is put into the group 
whose center is the closest to it among the six 
groups. And the covariance matrix for each clus- 
ter is initialized to be unit matrix. 

2. Re-compute the data centers of each cluster to be 
the center of the current cluster pcirtition. 

3. Based on the current cluster centers and covari- 
ance matrixes, re-assign the data partition by as- 
signing the feature vectors to the cluster that is 
closest to it in the logarithmic Gaussian distance 
spzLce. If the difference between the new data par- 
tition and the old one is bigger than a threshold 
and the inner loop time (Step 2 and Step 3) is less 
than the maximal time, goto Step 2, otherwise 
goto Step 4. 

4. Re-compute the covariance matrixes of the 6 clus- 
ters based on the current data partition. 

5. Based on the current cluster centers and covari- 
cince matrixes, re-assign the data partition by as- 
signing the feature vectors to the cluster that is 
closest to it in the logarithmic Gaussian distance 




Figure 8: Average faces of the six face clusters when the 
face model size is 5 blocks by 5 blocks. 



space. If the difference between the new data par- 
tition and the old one is bigger than a threshold 
and the outer loop time (Step 2 to Step 5) is less 
than the maximal time, goto Step 2. Otherwise, 
return the created mean vector and covariance ma- 
trix of each cluster. 

In Fig. 8, we give the clustering results on the 9600 
face feature vectors. The model size used is 5 blocks by 
5 blocks. Fig. 8(a) to (f ) are the average faces of the six 
clusters at the time of convergence. Because only the 
4 lowest frequent DCT parameters (the choice of 4 will 
be discussed later in this section) in each block aie used 
for feature vector creation, the block effect is noticeable. 
But in general the six meejx faces represent mainly the 
variances in face textures rather than those in different 
spatial aligning positions. 

Negative Training Samples In our experiments, we 
find many face like "nonface" patterns in our testing 
examples that can not be simply sepfirated from face 
patterns by thresholding. In order to reduce the misclas- 
sification rate, we further create a multimoded Gaussian 
model for these fa^e like negative patterns. 

The training samples are collected in a boot-strap 
fashion. That is, the positive face model is first cre- 
ated by the clustering algorithm as previously discussed. 
Based on this model, a face detector is designed, which 
is then apphed to the training pictures. The nonface 
patterns misclassified as faces by the face detector are 
used as negative samples. 

In this way, we collect about 9000 negative nonface 
samples and cluster them into eight clusters. The clus- 
tering algorithm used is the same as positive face sample 
clustering. The clustering result of model size 5 blocks 
by 5 blocks is shown in Fig. 9, in which each subfig- 
ure represents an average nonface pattern for the eight 
clusters. 

Classification The classification problem is based on 
the distance measures from the input feature vector 
to the positive and the negative clusters. Let's de- 
note the input vector as v, the six positive clusters as 
7V(C^''\vj^P^), k = 1,2, -..,6, and the eight negative 
clusters as ^^(C^'*^ v^'*^), k = 1,2, -..,8. Then a 6- 
dimension positive disteince vector could be defined as 




Figure 9: Average nonfaces of the eight nonface clusters 
when the face model size is 5 blocks by 5 blocks 




d('') = (d<''',4''>, ..,4''>), where 

4") = i(Arin27r + ln|C| + (v-v^''>)^C-'(v- v^"')), 

(2) 

where N is the dimension of the feature vector v. In 
practice, because the input feature vectors are in high 
dimension, the covariance matrix C is always decom- 
posed with KL transform: 

C = TAT-\ 

where T is the eigen-matrix and A is the diagonal ma- 
trix of eigenvalues. With only the first M eigenvectors 
of the eigen-matrix T used to span the feature space, 
the distance Eq, 2 is decomposed into two parts: the 
distance in the feature space (DIFS) and the distance 
from the feature space (DFFS), 



DIFS = - 



Mln27r+ 5]]ln|Afc|+ ^ |^ 



DFFS = 



(N-M) ln27r-l-(ln|C| 



^ln|A,|) + ^ 

where yk is the principle components cind = ||v|| — 
SjkLo^ yl ^ residue, p is a weighting factor based on 
the estimation of eigenvalues Ajb, (A; = Af, Af -f 1, • ■ , iV). 
In this work, the distance measure is therefore defined 
as: 



iVln27r + ln|C|+ V lii|Afc| +t;-^ 

Am 



. (3) 



where 77 is an adjustable weighting factor. 

Similarly, an 8-dimension negative distamce vector is 
defined as S""^ = {d^^\4"'\' • • ,4''^)> with respect to 
the eight negative clusters. 

Therefore, the classification problem is reduced from 
a high dimension problem {N = 200 ~ 400) to a lower 
dimension problem with AT = 14, which is then solved 
with a simple minimal distance classification Etlgorithm 
in our work, i.e., if 



mm 

A:=(l,-,6) 



An) 



-.8) 



the pattern is detected as a face, otherwise it is a non- 
fcLce. 



Figure 10: Illustration of the relation between the fea- 
ture vector length and the divergence of the face and 
nonface patterns. 



In implementing the classifier on practical face im- 
ages, we tried to select feature vectors of a variety of 
lengths. Based on the complexity analysis in Section 2, 
the complexity of the face detection problem requires 
feature vectors of length in the range of 200-400 dimen- 
sions. However, we notice that when feature vectors 
coming from various spatial zdigning positions are in- 
cluded as positive trfiining samples (due to the block 
qujintization problem), the high frequency DCT parzim- 
eters become unstable in both face model training and 
face detection functions. In order to illustrate this prob- 
lem, an experiment is carried out to measure the separa- 
bility feature between the 9600 positive samples and the 
9000 negative scimples under the condition of different 
feature vector lengths. 

The measure we used here is the divergence measure 
as defined in [6]. The divergence measure of two Gaus- 
sian distributions Ar(vi, Ci) and A^(v2,C2) is given as 



Div= -(vi-v2)^(cr'+c; 



1 



-i)(vi-v2)+^tr[(Ci-C2)(C2-^-Cr^)] 
(4) 

To study the influence of the feature vector length on 
the separability of the problem, both the 9600 positive 
S£imples and the 9000 negative saimples axe projected to 
the six positive clusters, which generates two groups of 
feature vectors in Gaussian distribution (in 7^ ). Their 
divergence is then computed according to Eq. 4. In 
this experiment, we used a model size of 5 blocks by 5 
blocks (40 by 40 pixels). The feature vector length is 
changed from 23 to 230, or 1 parameter per DCT block 
to 10 parameters per DOT block. The corresponding 
divergence is computed and their relation is illustrated 
in Fig. 10. 

FVom Fig. 10, we notice that the divergence increases 
with the feature vector length from 1 to 5 (parame- 
ters per block), amd then drops when the feature vector 
length further increases. That is, the more DCT param- 
eters are included into the feature vector, the less likely 
that the face patterns are able to be sepzirated from the 
nonface ones. This problem comes mainly from the high 
frequency components in each DCT blocks. When we 
, try to build up one model to represent face patterns in 
all the 64 different spatial aligning positions (with re- 
spect to the current model window), the high frequent 
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Figxire 11: Several detection results of our texture-based 
face detection algorithm (in the compressed domain) 

pcirameters in the feature vectors experience large vaxi- 
ation from ssanple to sample, which makes them con- 
tribute negatively to the separation problem. As indi- 
cated by Fig 10, for the specific case of model size 5 
block by 5 block, the ideatl feature vector length should 
be 4 to 5 DCT parauneters per block, or of about 100 
parameters for the entire feature vector. 

Based on this observation, we determine the feature 
vector lengths for all the 6 models in our system, which 
is listed in Table. 1. Note that the feature vector lengths 
are edl shorter than those used for pixel domain detec- 
tion. This difference is mainly, as have already been 
pointed out, due to the block quantization problem in 
the DCT domain. 

3.3 Experiments 

The texture based face detection algorithm is tested on 
a variety of pictures, which include: CMU database^, 
CMU online face database^, key frames of news video 
clips from CNN and NBC as well as pictures down- 
loaded from Internet and scanned from magazines amd 
books. Though coming from different sources zmd for- 
mats, the pictiires are all compressed into JPEG for- 
mat with Adobe Photoshop 5.0 (quality option medium, 
quantizer index 3) before processed. Some detection re- 
sults are shown in Fig 11. 

Compeared with the pixel-domain detection zdgorithms, 
our algorithm is less accurate because of block quemtiza- 
tion, especially the fcilse detection rate is relatively high. 
This can be seen from Fig 11. To overcome this prob- 
lem, we seek to combine the texture-based algorithm 
with a color- based algorithm. 

'*http://www.ius.cs.cmu.cdu/IUS/eye3ja8rl7/har/harl/usrO 
/har/faces/test/ 

http://www.ius. cs. emu. edu/IUS/usrpO/har/FaceDemo/gallery- 
inline.html 



4 Combined Color-Based and Texture- Based Face De- 
tection in the DCT Domain 

4.1 Generating Color Similarity Map 

Color-based (ace detection work in the DCT domain was 
first discussed by Wang [18]. In this work, our basic 
design is similar to theirs, except that we do not try to 
setup a threshold at the color detection stage. 

We assume the color pictures in JPEG and MPEG I 
frames are compressed in 4 : 2 : 0 format. The pictures 
aire partially decoded to restore their DCT parameters 
in the block structures. The skin color is modeled and 
detected at the macroblock level. That is, in each mac- 
roblock, the DC components of the Cb and Cr blocks 
are used as the average chromatic feature vector. A 
Gaussian model N(vs, Cs) for the skin color is created 
by training over a manually labeled picture database. 
Based on this skin color model, a color similarity map 
is generated for each picture at the macroblock level. 
For macroblock (i, j), if we denote its color feature vec- 
tor as V(jj), then its skin color similarity map entry 
is 

color(t,i) = -i [ln27r + In |C,| + (v(ij) - v,)'^C;' (v(i,,. 

(5) 

4.2 Color Constrained Face Pattern Detection 

With the color map available, the face detection prob- 
lem is further extended from the previous texture do- 
main to the color domain. That is, given an input color 
picture in its YCbCr format, we can apply the texture- 
based pattern detection on the Y component and the 
skin-color map based pattern detection on the CbCr 
components^. Because both detection designs have a 
statistical expression, it is easy to combine them with a 
statistical framework, either in a parsdlel or sequential 
structure. Though theoreticzdly the parallel structure 
has the merit of delayed judgment ajxd thus better de- 
tection performance, sequential structure is simpler atnd 
faster. In addition, the color based detection module is 
not a balanced module as compared with the texture 
based detection module in the detection accuracy and 
reliability sense®. Therefore we choose the sequential 
structure in our system design. A skin color map is at 
first created by the color analysis module to generate 
a skin color map as follows. Given a windowed image 
pattern W, its average skin-color similarity 

(where color{i^j) is defined in Eq. 5), is compared with 
a threshold T. Only the windows with an average simi- 
larity higher than the threshold are further processed by 
the texture analysis module as discussed in Section 3. 

^Because the skin-color similarity map is a map of scalar val- 
ues, it is straight-forward to design a face pattern detection algo- 
rithm based on this similarity map, which should be, in principle, 
the same as the one we have designed on the texture map. 

*In this work, we do not spend time to create an example 
based face pattern on top of the skin-color similarity map as we 
do in Section 3 on top of the texture map (though it is possible). 
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Table 1: Feature vector length at different model sizes 



This combination, though simple, is better than both 
texture-only and color-only algorithms. Compaired with 
texture-only zdgorithms, the color similarity map offers 
an additional constraint, which eliminates false alarms 
in the background without a skin-color appearance as 
well as improves the processing speed. Compared with 
color-only algorithms, the texture ainalysis module helps 
reduce faJse alarms introduced by skin-like backgrounds. 
In addition, the texture analysis module is also useful 
in locating faces when skin-like backgrounds are close 
to actual faces, in which case the simple shape analysis 
modules commonly used in the color-based eilgorithms 
always fail. 

4.3 Experiments 

We tested our face detection zdgorithm based on com- 
bined texture and color information over many pictures. 
The testing picture set we used in Section 3.3 is also 
used here, except the CMU database and CMU online 
database are not used because they are grayscale pic- 
tures. In Fig. 12, we show some detection results on 
pictures from various sources. 

To evaluate the performance of our aJgorithm quzin- 
titatively, we use the key frzimes of one day's NBC Nightly 
News (Feb. 18, 1999, totally 586 frames) as our testing 
data set. Within this data set, there are 42 faces and 36 
of them are frontal and upright ones that are within the 
coverage of our texture-based face model. The detection 
performance is hsted in Table 2. In Table 2, the new 
combined texture and color based face detection algo- 
rithm (column a) is compared with a color-based detec- 
tion algorithm (column b). Because the color-based a\- 
gorithm works in the pixel domain, while the combined 
textture and color based detection algorithm works in the 
compressed DCT domain, the comparison is focused on 
the detection performance, but not the spatial location 
accuracy of the detected faces. 

As we can see from Table 2, the combined texture 
and color based algorithm has less false zdarms than 
the color-based algorithm because the texture process- 
ing module has removed some false alarms introduced 
by the skin-color backgrounds. In addition, fewer faces 
are missed by the combined color and texture based 
algorithm than by the color-based algorithm. This dif- 
ference is mzunly due to the different ability of the two 
algorithms to detect faces surrounded by skin-like back- 
grounds. That is, in color-based algorithms, skin-color 
regions are first detected with a skin-color model based 
detector auid then processed with a shape anzdysis mod- 
ule. Only the regions have specific aspect ratios are 
accepted as feice regions. When some skin-like back- 
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42 


36 


face detected 


34 


30 


face missed 
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fcdse alarm 
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detection rate 


81% 


83% 


detection accuracy 


79% 


88% 



Table 2: Performance of the combined color and texture 
based face detection algorithm 

grounds axe close to the actual faces, the detected skin- 
color regions always get connected, which makes the 
shape of the detected regions no longer have a face-like 
shape. In contrast, the texture based approach is not 
influenced by the backgrounds, as long as they do not 
look like face patterns in the grayscale sense. 

As a result, the combined texture and color based 
algorithm exhibits better detection rate {i.e., the ratio 
of correctly detected faces v.s. total faces that should be 
detected) as well as better detection accuracy (i.e., the 
ratio of correctly detected faces v.s. the total detected 
faces) in this experiment. 

In speed performance, the texture anedysis module is 
the most time consuming part in our algorithm. How- 
ever, its complexity depends on the actually size of the 
detected skin-color map. On the mentioned CIF size 
586 key frames, the average speed of our algorithm is 
about 1 second per freime on a 366 Celeron PC. We ex- 
pect to achieve better performance on a better PC with 
some software optimizations. 

5 Concluding Remarks 

In this paper we mainly developed a texture-based face 
detection algorithm that works in the compressed DCT 
domain. This is a new work on compressed domain 
processing. Our work is based on the previous face de- 
tection works designed in the pixel domain, but we dis- 
cussed major problems we met in the compressed DCT 
domciin, such as block quantization problem, feature 
vector selection, preprocessing design in the DCT do- 
main, and multi-model based system structure. Due to 
the block quantization problem, we have to use shorter 
feature vector than the iace detection designs in the 
pixel domain. Therefore, the proposed texture-based 
detection algorithm is not as good as its counterpcirts 
in the pixel domain. To solve this problem, we pro- 
posed to combine the texture-based algorithm with a 
face color detection algorithm. Experiments indicate 




Figure 12: Face detection examples 



that the combined texture and color based detection al- 
gorithm works better than both texture-only and color- 
only algorithms. 

To sum up, this work is interesting because it first 
proposed to do traditional pattern detection work on 
the compressed DCT domsdn, which is a promising re- 
secirch direction for multimedia content analysis. In ad- 
dition, in this work we also proposed, for the first time, 
to combine texture and color information for face detec- 
tion. This is especially useful for multimedia processing, 
in which most visual data are in color formats. 
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